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The Sherrington-Kirkpatrick model of spin glasses, the Hopfield 
model of neural networks and the Ising spin glass are all models 
»vj . of binary data belonging to the one-parameter exponential family 

with quadratic sufficient statistic. Under bare minimal conditions, 
we establish the \/^-consistency of the maximum pseudolikelihood 
P^ , estimate of the natural parameter in this family, even at critical tem- 

PLh ' peratures. Since very little is known about the low and critical tem- 

perature regimes of these extremely difficult models, the proof re- 
quires several new ideas. The author's version of Stein's method is 
^ ' a particularly useful tool. We aim to introduce these techniques into 

the realm of mathematical statistics through an example and present 
some open questions. 



m 

^ [ 1. Introduction and main result. 

\^ , 1.1. Statement of the problem. Suppose our data is a vector of depen- 

■rij- I dent ibl-valued random variables, denoted by a = (ui, . . . , (Tat). A simple and 

^^ • natural way of modeling the dependence is to consider the one-parameter ex- 

ponential family where the sufficient statistic is a quadratic form. Explicitly, 



this means that we have a collection of real numbers {Jlj)i<i<j<N defining 



following way: For any tG { — 1,1}^ and any /? > 



a parametric family (P^)^>o of probability distributions on {—1, 1}^ in the 



(1) P^{^ = t} = 2-^exp(/3iJAf(r) - 7VV7v(/3)), 

where 



>< 
.^.: (2) ^7v(r):= Y. J^^^^^r 

l<i<j<N 
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2 S. CHATTER JEE 

Here [3 is the only unknown parameter. The function i/^tv is determined by 
the normaUzing condition Y1,t '^^W = t} = 1. For future use, we let J^^ = J^ 
and Jj^ = for each i and j and let J^ be the matrix {Jij)i<i,j<N- 

Many popular models, for example, the usual ferromagnetic Ising model 
[18, 21], the Sherrington-Kirkpatrick mean field model [24] of spin glasses 
and the Hopfield model [17] of neural networks all belong to the above family, 
with various special structures for J^. 

The main problem with the maximum likelihood estimation of /? is that 
the function tpiy is generally not computable, theoretically or otherwise. 
There are numerical methods for computing approximate MLEs (e.g., Geyer 
and Thompson [12]), but very little is known about the number of steps re- 
quired for convergence. Moreover, we do not know of any gradient-based 
algorithm which provably converges to a global maximum of the likelihood 
function, and there are serious doubts whether that is indeed possible. The 
Jerrum-Sinclair [20] algorithm for computing the normalizing constant con- 
verges in polynomial time, but requires the presence of a local dependency 
graph, which makes it inapplicable to spin glasses. Most importantly, even 
if one assumes that the MLE has somehow been approximated, general con- 
ditions for the consistency of the MLE are not available. 

1.2. The pseudolikelihood estimator. A natural recourse is to consider 
Julian Besag's maximum pseudolikelihood estim,ator (MPLE) [4, 5]. In short, 
it gives the following prescription: Suppose we have a random vector X = 
{Xi, . . . ,Xn) whose distribution is parametrized by a parameter (3. Let 
fi{(3,X) be the conditional probability density of Xj given {Xj)j^i. Then 
the MPLE of (3 is defined as 

Pmpl :=argmax]T/i(/3,X). 
/3 i 

In our setting, the conditional density of Cj given the rest is easy to compute: 

N / ^ \ 

log Mf3, a) = (3ai ^ J^a,- - log cosh /3 ^ J^^a, - log 2. 

j=i V i=i / 

Thus, the derivative of the pseudolikelihood function is 

B ^ 
SAP)--=ggT.^ognil3,a) 

^ i=l 

(3) 
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Noting that HN{a) = ^ J2i,j=i ^ij^i^j^ we use the above expression to define 
the maximum pseudohkeUhood estimate in this problem as 

(4) Pn := infl x > : Fiv(a) = i £ J^a,- tanh( x f] J^ak] \. 

I i,j=l V fc=l / J 

The infimum of the empty set is defined to be oo, as usual. It is not difficult 
to show that the expression on the right-hand side is an increasing function 
of X. Thus, it is not only feasible but extremely easy to compute Pn, either 
by Newton-Raphson or even a simple grid search. 

1.3. The consistency result. Having defined the MPLE, it is natural to 
ask whether it is consistent. Unlike other exponential families, spin glass 
models present the unique challenge that even the most basic characteris- 
tics, like the correlations between spins at different sites — let alone weak 
laws or central limit theorems — are completely intractable when /? is larger 
than a threshold. Any attempt at proving a consistency result in these so- 
called "low temperature regimes" has to overcome the lack of almost any 
meaningful information about the behavior of the model. For instance, in 
most models of spin glasses, there is no known technique for proving that the 
function S^ defined in (3) converges to a nonrandom limit function (after 
suitable normalization) as N ^ oo, unless /? is sufficiently small. The main 
achievement of this paper is a new technique that bypasses these hurdles 
and proves, with minimal information, that the MPLE is viV-consistent at 
all temperatures. The most challenging part is to tackle the issues at the 
"critical temperatures," that is, points of phase transition. 

Recall the following definition from basic linear algebra: The L^ operator 
norm of a square matrix A is defined as \\A\\ := sup||j,||=i ||^a^||, where ||x|| 
stands for the usual Euclidean norm of the vector x. If j4 is a real symmetric 
matrix, then ||^|| is equal to the spectral radius of A. Our main result is the 
following. 

Theorem 1.1. Consider the exponential family of models (1) with the 
quadratic sufficient statistic (2). Fix f3 > 0, and let (5^ he the estimator (4)- 
Suppose we have a sequence of such models with N^oo, satisfying 

(a) sup^v II J^ll < cxD, and 

(b) liminfAr^ooV'Af(/3) >0. 



Then {Pn}n>i is a y N -consistent sequence of estimators for /3. 

Note that ipN^fS) is always an increasing nonnegative function, as long as 
P >0. Hence, if condition (b) is satisfied for some positive /3, then it holds 
for every /?' > /3. 
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Surprisingly, the MPLE may not be consistent at /3 = 0. A natural coun- 
terexample, which also shows that /Jat may not be consistent if condition 
(b) is not satisfied, is provided by the Curie- Weiss model of ferromagnetic 
interaction at high temperature. This is discussed in Section 1.7 at the end 
of this section. 

Some rigorous results are known about the consistency of the MPLE in 
settings with some kind of Markovian structure (e.g., lattice processes [10, 
13] and spatial point processes [19]), but the techniques of these papers 
cannot be used to prove Theorem 1.1. The reason is that the dependence in 
spin glasses is neither local nor mean field in the classical sense, and there 
is no way to extract any conditional independence. 

The main technique employed in this paper is a new version of Charles 
Stein's method of exchangeable pairs, developed by the author in [7, 8, 9]. 
The first step is the construction of an exchangeable pair (a, a') and an anti- 
symmetric function F such that K(F{a,a')\a) = the pseudolikelihood score 
function. The theory then gives a "temperature-free" method of showing 
that the derivative (3) of the log-pseudolikelihood function is close to zero 
with high probability at the true value of the parameter, resulting in the 
following lemma. 

Lemma 1.2. Let S^ be the derivative of the log-pseudolikelihood defined 
in (3). Then for any P>0 we have 

whereC = Q\\J^f + Q(i\\J^f + 2p'^\\J^f. Consequently, ¥p{\S^{p)\ > 5} < 
C/{N5'^) for any S>0. 

Having shown that, the next step is to prove consistency of the MPLE 
by inverting the pseudolikelihood function. This is usually an easy step in 
theoretical statistics; here, however, this is a very hard step because the most 
basic features of the model are unknown. To see that there are nontrivial 
issues involved, one can just look at the counterexample of the Curie- Weiss 
model at high temperature, presented in Section 1.7, where Lemma 1.2 holds 
but the MPLE is not consistent. 

Theorem 1.1 has the desirable feature that the conditions (a) and (b) are 
satisfied in almost all commonly used models of spin glasses. We consider 
two examples. 

1.4. Application to the Sherrington-Kirkpatrick (S-K) model. In the 
Sherrington-Kirkpatrick model of spin glasses [24] , we have 
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where {gij)i<i<:j<oo is a fixed realization of an array of independent standard 
Gaussian random variables (and gji = gij). Some of the most important 
questions about the S-K model have only recently been answered, mainly 
due to the monumental efforts of Talagrand (see [25, 26], to mention a few) 
and the important contributions of Guerra [14], Guerra and Toninelli [15], 
Panchenko [22], Comets and Neveu [11] and other authors (e.g., [1, 2, 23]). 
However, despite all the progress, the model is still far from tractable, 
especially when /? > 1. Thus it is quite remarkable that Theorem 1.1 ap- 
plies with equal ease for all /?, as we now show. It follows from a standard 
result in random matrix theory (see, e.g., Bai [3], Theorem 2.12) that con- 
dition (a) holds for almost every realization of {gij)ij. For (b), we can use 
the monotonicity of tpN, and the result [1] that lim'07v(/3) = /3^/4 for /? < 1 
in the S-K model (see Talagrand [25], Theorem 2.2.1), to conclude that 
lim inf ■0Ar (/?) > for every positive /3. 

1.5. Application to the Hopfield model. In the Hopfield model [17] for a 
system with N particles affected by M attractors, we have 

tM,N ^ v^ 

fc=l 

where {r]ik}i<N,k<M is a fixed realization of a collection of independent ran- 
dom variables with P{r/jfc = ±1} = 1/2. This model is even less understood 
than the S-K model. The groundbreaking contributions in the study of this 
model are mainly due to Bovier and Gayrard (see [6]) and Talagrand (com- 
piled in [25], Chapter 5). 

Again, it follows from random matrix theory (Bai [3], Section 2.2.2) that 
condition (a) is almost surely satisfied whenever M/N stays bounded as 
N — > oo. The validity of (b) follows from spin glass theory (Talagrand [25], 
Theorem 5.2.1). The conditions can similarly be verified for the other models 
mentioned in the abstract. 

1.6. Multiparameter models and open problems. The main shortcoming 
of Theorem 1.1 is that it applies only to one-parameter families. The good 
news is that the main tool. Lemma 1.2, can easily extend to multiparameter 
families. However, as mentioned before, the surprising fact is that prov- 
ing consistency of the P^- is very hard even after one has something like 
Lemma 1.2. The argument that we use in this paper to move from Lemma 
1.2 to the consistency of (3^ is intensely one-dimensional. It is so specialized 
that it breaks down even if we just add a simple linear term like /iX)r=i '^i 
in the Hamiltonian. 

The main open problems (currently under investigation by the author) 
are (i) to prove a version of Lemma 1.2 for multiparameter models, which 
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should be doable by the techniques of this paper, and (ii) to deduce a multi- 
parameter version of Theorem 1.1 from this generalized form of Lemma 1.2, 
which is likely to be quite hard if not impossible; the author does not know 
how to solve this part with the current tools. 

Besides these issues, there are also natural open questions about vari- 
ance bounds and central limit theorems for the pseudolikelihood estimates, 
with the tantalizing possibility that non-Gaussian limits hold at the critical 
temperatures. 

1.7. A counterexample. Theorem 1.1 may fail if condition (b) is not sat- 
isfied. In particular, this may happen at (3 = 0. Counterexamples are easy 
to construct if (a) is not satisfied; necessity of (b) is more subtle. 

The simplest counterexample is provided by the Curie- Weiss model where 
J^j = 1/N for all i,j. This is a well-known toy model of ferromagnetic in- 
teraction. It is known that for < /3 < 1, lim7v^oo^Af(/?) = in this model 
(see, e.g., [25], page 324). Let itln = j[J2i=i^i: ^i^d define the function 

. , Sa{x) tanh(xmAr) 
Jn[x) := I . =1 . 

Suppose < /? < 1. It is known that mjv — > in probability in this regime 
(again, see [25], page 324). Thus, /Ar(x) — > 1 — x in probability for each x. 
Since /at's are increasing functions, and /J^r is a zero of /at, a standard 
argument now gives that /?7v — > 1 in probability. Thus, the pseudolikelihood 
estimate of /3 in the Curie- Weiss model is not consistent when < /? < 1. 
Note that Lemma 1.2 continues to hold, though. 

2. A finite sample result and proofs. In this section N is fixed. Conse- 
quently, subscripts and superscripts involving A'^ are unnecessary. Let J be 
an A^ X A^ nonzero symmetric matrix with zeros on the diagonal. As before, 
consider the one-parameter exponential family of probability distributions 
(P^)^>o on {—1,1}^, defined as 

P^({t}) = 2-^exp(/?i7(r) - iV^(/3)), 

where the sufficient statistic H has the form 

^W= H JijTiTj. 
l<i<j<N 

Here J is an A^ x A^ nonzero symmetric matrix with zeros on the diagonal. 
As usual r will always denote a typical element of { — 1, 1}^ and a will be 
reserved for random elements. Now for each i, let us define the function 
mi:{-l,l}^^M as 

(5) mi{T) = Y^ JijTj. 
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Note that mi^r) does not depend on tj because Ju = by assumption. It is 
not difficult to verify tliat if cr '^ P^ , tlien 

%(o"i|(o"j)jyi) =tanh(/3mi(o-)). 

Tlie quantity mi{T) is called the local field at site i in the configuration r. 
Note that H{t) = | J2i "^i(T")'^i in the class of models that we are considering. 
Next, for each r G {— 1, 1}'^, we define a map Sr : [0, cxd) — > M as 

(6) Sr{x) = — Y^ mi(r)(ri-tanh(xmi(r))). 

1<J<JV 

Note that this is the same as the derivative of the log-pseudolikelihood de- 
fined in (3). Interpreting tanh(iboo) as ±1, we can continuously extend Sr 
to [0, oo] by defining 

1 ^ 
'S'r(oo) = — ^(mi(r)ri - |mi(r)|). 

Note that Sr (oo) < for all r. Finally, let 

(7) i3{T):=mf{x>0:Sr{x)=0}. 

It is easy to check that this is exactly the estimate defined in (4). When a 
is a random element picked from the measure P^, we will usually just write 
P instead of /3(cr). 

Theorem 2.1. Take any (3 > and < e < 1. Let 7 = max{/3|| J||, 1}. 
Let P be the estimate defined above in (7). Then 

P/3{| tanh(C/3) - tanh(C/3)| > e} < -^, 

where we can take 

O = — , ,^, and K = , ,^,„ + 



V'(/?) ^{(3? m(3) 



10' 



Note that Theorem 1.1 follows directly from Theorem 2.1: Suppose the 
conditions (a) and (b) of Theorem 1.1 are satisfied. By combining Theorem 
2.1 with conditions (a) and (b), we get 

tanh(CAr/3Ar) — tanh{Cj\[(3) — > in probability, 

where Cn = SPWJ'^W^ /'iPn{P). A simple verification shows that i1^n{P) < 
/3||J^||. Thus, by condition (b), we get liminf||J^|| > 0. Combined with 
(a) , this gives < lim inf Cat < lim sup Cn < 00 . It follows that Pn — P = 

0{N~y^). 
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Let us now begin the proof of Theorem 2.1 with the observation that the 
model remains unchanged if we replace /? by /3|| J|| and J by || JH""*^ J. Thus, 
without loss of generality, we can assume that 

(8) ||J|| = 1. 

The proof is divided into a sequence of lemmas. There are two main steps: 
(1) To show that Sa{0) ~ with high probability under P^ (this is Lemma 
1.2), and (2) to show that Sa{(3) f« ^ /? ss /3. While the first step is the 
conceptual mainstay, the second step is surprisingly hard because of the 
extreme lack of information about the model. It is carried out in four steps 
(Lemmas 2.2, 2.3, 2.4 and 2.5). 

Proof of Lemma 1.2. Recall that we are working under the assump- 
tion that ||J|| = 1. Define a function F:{-1,1}^ x {-1,1}^ ^R as 

N 
i=l 

Note that F is antisymmetric^ that is, F{t,t') = —F{t',t). This will be 
useful later on. 

Now fix /3 > and suppose a is drawn from the Gibbs measure at inverse 
temperature /?. Since P is fixed in this proof, we will write E and P instead 
of Efs and P^. 

Now choose a coordinate / uniformly at random, and replace the /th 
coordinate of cr by a sample drawn from the conditional distribution of aj 
given {aj)j^j. Call the resulting vector a'. Then {a, a') is an exchangeable 
pair of random variables. Observe that 

F{a,a') =mi{a){ai-a'j). 

Now let 

1 ^ 
/(a):=E(F(a,a')k) = -Em,(a)(a,-E(c7.|(a,),y,)) 

i=l 

1 ^ 

= — Emi(o-)(o-i -tanh(/?mj(o-))) = S„[I3). 

i=l 

ThenE(/(cr)^) =E(/(cr)F((7, o"')). Since (cr, o"') is an exchangeable pair, there- 
fore 

E{f{a)F{a,a'))=E{f{a')F{a',a)). 

Again, since F is antisymmetric, we have ¥,{f(a')F(a',a)) = —K{f{a')F(a, 
o"')). Combining, we have 

E{f{af) = E{f{a)F{a, a')) = -E{f{a')F{a, a')) 
(9) 

= iE((/(cT)-/(a'))F(a,a')). 
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For any 1 < j < iV and r G {-1, 1}^, let 

t(J) := (n, . . . , rj_i, -Tj,Tj+i, ...,Tn) 
and 

g-/3Tjmj{r) 

Then 

E((/(^)-/K))F(^,^')k) 

1 ^ 

(10) =_^(/(^)_/(^0)))F(<T,<7(^))p,(a) 

i=i 

1 ^ 

Now, for ease of notation, we define the functions Oj and bij as 

(11) mir) := Ti - tanh(/?mj(T)) 
and 

(12) b^j{T) := tanh{Pm^{T)) - tanh{Pm^{T^^'^)). 
Then /(r) = N^^J2i''T^i{T)(^i{T)^ ^^'^ hence 



-5:(m,(a)-m,(a(^-)))a,(a) + -5:m,(a(^-))(a.(cT)-a.(cT(^-))) 



/(^)-/(cT^^O 

JV 1 N 



1=1 1=1 

= ^^J:J^M-) + ^^^ - ^E-,(aO-))6.,(a). 

Let Tij, T2j and Tsj be the three terms on the last line. Using (9) and (10), 
we see that 

1 ^ 
(13) nf{<yf) = j^T.iTij+T2,+T^,)m,{a)a,p,{cj). 

i=i 
Now, since ^iai{a)'^ < AN and 

Y,m,{afp,{af <Y.'n,{af = \\Jaf <N, 
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it follows that 



1 ^ 



AT 



j=i 
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;(^) 


2 

Ar2 


N 
) ; Jijai{a)mj{a)pj{a) 




2 


s/4:NVN=^. 



Note that it would be inefficient to simply use the Cauchy-Schwarz inequal- 
ity, because it would not allow us to take advantage of the large amounts of 
"cancellation" between positive and negative terms in the quadratic form. 
Next, let us look at the T2-term. We have 

-. N 2 ^ 

— Y, T2jmj{a)ajPj{a) = ^2^2 ^ji'^fPji'^) 



N 



i=i 



Ar2 
2 



N 






2 

N' 



Finally, let us bound the Ts-term. Take any i and let ej be the ith coordinate 



vector in 



pTV 



Then 






|p* 7lP < IIp-IPII /IP < 1 



Thus, if we let J2 be the matrix {Jfj)i<i,j<N-, then by the well-known result 
that the Lp' operator norm of a symmetric matrix is bounded by its L°° 
operator norm, we get 

N 
IIJ2II < max y^ J,^i < 1. 

Now let h{x) := tanh(/3x). It is easy to verify that ||/i"||oo ^ 0^ ■ Therefore, 
\h{m,{a)) - /i(?ni(a(j))) - (^^(a) - mi{a^i'^))h' {mM))\ 



<^{mi{a)-mi{a^^^)f. 



■ 2Jijaj. So the above 



Let Ci{a) := h'{'mi{a)), and note that mi{a) — mi{a^^'] 
inequality can be rewritten as 

(14) \hij{a) - 2J,jajCi{a)\ < 2(3^ jfj. 

Finally, note that |ci(o")| < (3. Using all this information and the bounds on 
the operator norms of J and J2, we see that for any x,y £ M"', 



Yxiyjhij{a) 



«j 



< 



Yxiyj{2JijajCi{cF)) 



«j 
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'^Xiyj{bij{a) - 2Jij(jjCi{a)) 

1/2 



«J 



<2(y^{x,c,{<j)f\ (y^{y,<yA +^|x,y,|2/?2j, 



2 72 



<(2/? + 2/?2)||x||||y||. 
Again, it is clear from the definition (12) of hij that |&ij(o')| < 2j3\Jij\. Thus, 



/ J ^iUj 'Jij '^ij \^ J 



't,J 



<^|x,y,|2/34<2/3||x||||y| 



«j 



Applying these inequalities to the Ts-term in (13), we get 



N 



— ^r3jmj(a)ajpj(cj) 



N 



i=i 



1 ^ 



1 



N 



—2 Y^ {mi{a) - 2Jijaj)bij{a)mj{a)ajPj{a) 



iV2 



«J=1 



< 



(2/3 + 2/3^) +4/3 

N 



Thus, we have computed upper bounds for all terms in (13). Combining, we 
have 



(15) 



E(/(.rt<i±^^, 



Since S^iP) = /(c"), this completes the proof. D 

This completes step (1) of the proof. Let us now engage in completing 
step (2), that is, showing that Sa{P) ^0 ^ P ^ (3. 

It is not difficult to verify that the function S^ is nonincreasing; therefore, 
we only have to show that S^ is sufficiently "nonflat" with high probability. 
Again, the proof is divided into two substeps: (2a) Show that if N~^H{a) > 
c > for some suitable constant c, then S^ is sufficiently nonflat. (2b) Find 
c > such that N~^H{a) > c with high probability. 

Lemma 2.2. Suppose r G {—1, 1}^ and c> are such that N^^H{t) > 
c. Then for any /3 > 0, we have 



tanh(2/3(r)/c) -tanh(2/3/c)| < 



8\Sr{fi)\ 
3c5 
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Proof. Recall the definition (5) of the local fields mi,i = l, . . . ,N. First, 
note that 



l<i<N 



Y^ mi{T)Ti 

l<i<N 



■■2N^^\H(t)\ >2c. 



Also by assumption (8), we have 

i 

Now recall the Paley-Zygmund second moment inequality (see, e.g., (2.18) in 
[25] ) , which says that for any nonnegative square-integrable random variable 
X we have 

HX >a}> ^^^^} ~ "-' for each 0<a<E(X). 

Thus, 

#{i:\m,{T)\>c} ^ {N''J:^\m{r)\-cr ^ , 

N - N-^j:,mi{Ty - ■ 

Again, by Chebyshev's bound we have 

#{i:|m,(r)| >2/c} ^ N-'J:^m{r)^ ^ c^ 
N - (2/c)2 - 4 ■ 

Combining, we get 

#{i:c<|mi(cj)| <2/c} Sc^ 
N -~' 

Thus, for every x>0, 

^i<i<jvc°sh2(^"^«('^)) 



l^^(-)l = ^ E ,:z^, 



> 



1 -^-^ mi(r)^ 



E 



N . / M,„, cosh (xmiir)) 

i:c<\mi{T)\<2/c ^ '^ ^ ■' 



3c2 . y2 

> mm 



4 c<y<2/c cosh (xy) 

Now, the map yi— >y^/cosh (xy) is unimodal, and hence the minimum in 
the last expression is attained at either y = c or y = 2/c. Surprisingly, the 
crude bound obtained by putting y = c in the numerator and y = 2/c in the 
denominator can be shown to be the best that one can do in this situation. 
Thus, we have 

3c4 



\SUx)\ > 



4cosh^(2x/c) 
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Since 5^(0) = 2N~^H{t) > c> 0, S^(oo) = N~^Y.i{^^i{^)n - |?Tii(r)|) < 0, 
and Sr is continuous on [0, oo], therefore Sr{(3{T)) = whether /3(r) < oo or 
not. Using this, and the monotonicity of Sr, we see that for any /? > 0, 

\Srif3)\ = \Srm-Sr0iT))\ 
rl3Vl3{T) 3^4 

— / 97 — rrdx 

J/3A/3(r) 4cOsh^(2x/c) 

= |tanh(2/3(T)/c) -tanh(2/3/c)|. 

8 

This completes the proof. D 

Let us now carry out step (2b) — that is, find a positive constant c such 
that N~^H{a) > c with high probabihty. The relevant result is Lemma 2.5, 
but we need two preliminary lemmas to overcome the problems at critical 
temperatures. 

Lemma 2.3. For any nondecreasing function / : M — > M, the quantity 
Ki3f{H{a)) is a nondecreasing function of f3. 

Proof. Recall the well-known inequality (e.g. [16]) that for any random 
variable X and any two nondecreasing functions / and g, cov{f (X) , g{X)) > 
0. A direct computation shows that 

^Epf{H{a))=coypifiHia)),H{a)). 

This completes the proof. D 

Lemma 2.4. For any < /3i < /? and c < il)'{l3i), we have 

¥p{N-'H[a)<c}< "^"^^^^ 



Ar(#(/?i)-c 



i2- 



Proof. The assumption that J is not the zero matrix implies that ip{0) 
and V''(/5) are positive for all /9 > 0. First, assume /3i = j3. Standard calcula- 
tions give 

^'(/3) = N-^¥.pH{a) and ^"{[i) = N'^ var^ H{a). 

A simple application of Chebyshev's inequality completes the proof in this 
case. In general, since [3i < /3, therefore Lemma 2.3 gives 

¥p{N-^H{a) <c}< Ff,,{N-^H{a) < c}. 

This completes the proof. D 
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We are now ready to state the step (2b) and finish the proof using the two 
preceding lemmas and an additional argument which eliminates the need to 
consider the derivatives of ^l^. 

Lemma 2.5. For any /3 > 0, we have 



■^N. Thus for any /3 



< 



Proof. By assumption (8), we have \H{a)\ = ^\a^Ja\ < ^||J||||cr|p < 



< V'(/?) = N-^KpHia) < i. 
Thus, for any < e < /3, 

V'(/3-£)>V'(/3)-|. 
Now, since "0' is an increasing function, therefore 

(16) v'(/? -e)> ^(/^-")-^(Q) = V'(/^-^) > i^m-ame ^ 

f] — e [3 - e (3 - £ 

By the mean value theorem, there exists /3i S (/9 — e, /?) such that 

Combining this with the upper bound on ip'iP) and the lower bound on 
ip'{P — e) obtained above, we get 

£[(i-e) 

Note that the numerator is always positive, since ip{l3) = Jq tp'iu) du < ^P- 
Last of all, observe that ijj'iPi) > il)'{[i — e) by the monotonicity of ^'. Thus, if 
e and c satisfy 

(17) <£</?, V(/?)>| and c<^V-''(/3i), 

then Lemma 2.4 gives 

3{N~^H{a)<c}< 

4((l/2)/5-V^(/?))(/?-£) 






< 



< 



Neii^iP) - (1/2)6)2 
2/32 



iVe(0(/3) - (l/2)e) 



2 • 
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The proof is completed by putting e = ip{P) and c = V'(/3)/4/?. We only have 
to verify that the chosen e and c satisfy (17). First, note that e < (3/2 < (3, 
and V(/3) - ^e = ^i^iP) > 0. Also, by (16), 

HP) - (l/2)g V^(/?) - (l/2)£ 1 1 

^= 2^ ^ 2(/3-.) ^2^^(/^-^)^2^(/^i)- 

This completes the proof. D 

Combining steps (2a) and (2b), we are now ready to finish the whole of 
step (2). The following lemma shows that S^iP) ~ ^ /3 ss /3. 

Lemma 2.6. For any P > and e > 0, we have 

P;3{|tanh(Ci/3) -tanh(Ci/3)| >e} < ^ +P^{|5,(/3)| > Cge}, 

where Ci = ^P/^P), C2 = 8/3VV'(/?)^ and C3 = Si^iPf /{2'^'^P^). 

Proof. From Lemma 2.2, we see that for any positive P,e and c, 

{r : I tanh(2/3(r)/c) - tanh(2;3/c)| > e} 

C {t:N~^H{t) < c}U{T:\Sr{P)\ > 3c^e/8}. 

Putting c = V'(/3)/(4/3) and applying the bound from Lemma 2.5 completes 
the proof. D 

Finally, combining Lemmas 2.6 and 1.2, and getting rid of assumption (8) 
by substituting P\\J\\ for (3 and P\\J\\ for (3, we get Theorem 2.1. 
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